Using term similarity measures for classifying short document data
نویسندگان
چکیده
منابع مشابه
Medical Document Clustering Using Ontology-Based Term Similarity Measures
Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term ...
متن کاملSimilarity Measures for Short Queries
Ad-hoc queries are usually short, of perhaps two to ten terms. However, in previous rounds of TREC we have concentrated on obtaining optimal performance for the long TREC topics. In this paper we investigate the behaviour of similarity measures on short queries, and show experimentally that two successful measures|which give similar, good performance on long TREC topics|do not work well for sho...
متن کاملInvestigating Measures for Pairwise Document Similarity
The need for a more effective similarity measure is growing as a result of the astonishing amount of information being placed online. Most existing similarity measures are defined by empirically derived formulas and cannot easily be extended to new applications. We present a pairwise document similarity measure based on Information Theory, and present corpus dependent and independent applicatio...
متن کاملMultilevel Measures of Document Similarity
Many applications such as document summarization, passage retrieval and question answering require a detailed analysis of semantic relations between terms within and across documents and sentences. Often one has a number of sentences or paragraphs and has to choose the candidate with the highest level of relevance for the topic or question. An additional requirement may be that the information ...
متن کاملSimilarity Measures for Text Document Clustering
Clustering is a useful technique that organizes a large quantity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a basis for intuitive and informative navigation and browsing mechanisms. Partitional clustering algorithms have been recognized to be more suitable as opposed to the hierarchical clustering schemes for processing large datasets....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computational Intelligence Studies
سال: 2021
ISSN: 1755-4977,1755-4985
DOI: 10.1504/ijcistudies.2021.10038081